A fast, single-iteration ensemble Kalman smoother for online, sequential data assimilation

Authors:

Colin Grudzien1, Marc Bocquet2

Special Thanks:

Eric Olson1 & Mihye Ahn1 for computing logistics and support

  1. University of Nevada, Reno, Department of Mathematics and Statistics
  2. CEREA, A joint laboratory École des Ponts Paris Tech and EDF R&D, Université Paris-Est, Champs-sur-Marne, France.
University of Nevada, Reno logo.
CEREA logo.

Outline

  • Motivation
  • A review of the ensemble transform Kalman filter (ETKF) in perfect, hidden Markov models
  • Fixed-lag smoothing
  • The classic ensemble transform Kalman smoother (ETKS)
  • The 4D smoothing cost function
    • Iterative analysis and the iterative ensemble Kalman smoother (IEnKS)
  • An alternative formulation of iterative smoothing for weakly nonlinear forecast error dynamics:
    • The single-iteration ensemble transform Kalman smoother (SIETKS)
  • Numerical simulations

Motivation

  • Iterative, ensemble-variational methods form the basis for the state-of-the-art for nonlinear, scalable data assimilation (DA)1

    • Methods following an ensemble Kalman filter (EnKF) style analysis include the ensemble randomized maximum likelihood method (EnRML)2 and the iterative ensemble Kalman smoother (IEnKS)3
  • These iterative, ensemble-variational methods combine the benefits of:
    1. the high-accuracy of the iterative solution of the Bayesian maximum a posteriori (MAP) solution to the DA problem;
    2. the relative simplicity of numerical model development and maintenance in ensemble-based DA;
    3. the ensemble-based analysis of time-dependent errors and possibly discontinuous, physical model parameters; and
    4. the variational analysis, optimizing hyper-parameters for, e.g., inflation4 and localization5 to augment the estimation scheme.
1. Bannister, R. (2017). A review of operational methods of variational and ensemble‐variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 143(703), 607-633.
2. Chen, Y., & Oliver, D. (2012). Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Mathematical Geosciences, 44(1), 1-26.
3. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
4. Bocquet, M., Raanes, P., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.
5. Lorenc, A. C. (2003). The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(595), 3183-3203.

Motivation

  • However, a barrier to the use of iterative, ensemble-variational schemes in online, sequential DA still lies in the computational bottleneck of simulating the nonlinear, physics-based model to perform the sampling procedure.

  • Many iterative smoothing procedures require multiple runs of the ensemble forecast to produce the forecast, filtering and re-analyzed smoothing statistics.

  • For online applications in which there is a short operational time-window to produce a forecast for future states, the iterative ensemble simulations may be prohibitively expensive.

  • Particularly in case where nonlinearity in the DA cycle is not dominated by the error dynamics in the ensemble forecast,

    • i.e., if the tangent-linear evolution of Gaussian error distributions is an adequate approximation,
  • the cost of an iterative, ensemble forecast solution may not be justified by the improvement in forecast accuracy.

  • Nonlinearity in the DA cycle may instead by dominated by:

    • nonlinearity in the observation operator,
    • nonlinearity in hyper-parameter optimization, and / or
    • nonlinearity in temporally interpolating a re-analyzed solution over a long window of past states.
  • Particularly for sequential forecast cycles in which the linear-Gaussian approximation of the forecast error may be appropriate, like synoptic meteorology,

    • a different form of iterative, ensemble-variational smoother may have substantial advantages in a computational cost / forecast accuracy tradeoff.

Motivation

  • We consider a simple hybridization of the classic ensemble transform Kalman smoother (ETKS) with the iterative ensemble Kalman smoother (IEnKS) to produce a single-iteration, fixed-lag, sequential smoother.

  • We combine:

    1. the filtering solution and retrospective analysis as in the classical EnKS; and
    2. a single iteration of the ensemble forecast over the DAW with the improved, re-analyzed prior;
  • to improve the EnKF filter and forecast statistics in the subsequent DA cycle.

  • The resulting scheme is a “single-iteration”, ensemble Kalman smoother that sequentially solves the filtering, Bayesian MAP cost-function;

    • the scheme produces the forecast, filtering and re-analyzed smoothing statistics within a single iteration of the DAW with the ensemble forecast.
  • This scheme seeks to minimize the leading order cost of ensemble-variational smoothing, i.e., the ensemble forecast over the DAW.

    • However, we are free to iteratively solve the filtering cost function for any single observation without iterations of the ensemble forecast.
  • This is an outer-loop optimization of the cost of fixed-lag, sequential smoothing, designed for online applications with weakly nonlinear forecast error dynamics.

  • We will denote the general scheme a “single-iteration” smoother, while the specific implementation presented here is denoted the single-iteration ensemble transform Kalman smoother (SIETKS).

Motivation

  • For linear-Gaussian systems with the perfect model hypothesis, the SIETKS is a fully consistent Bayesian estimator, albeit one that uses redundant model simulations.

  • When the forecast error dynamics are weakly nonlinear, yet other aspects of the DA cycle are strongly nonlinear,

    • e.g., the filtering cost function is nonlinear due to hyper-parameters or nonlinearity in the observation operator,
  • we demonstrate the SIETKS has predictive performance comparable with fully iterative methods.

    • However, the SIETKS has a numerical cost that scales with matrix inversions in the ensemble dimension, rather than the cost of the ensemble forecast.
  • We furthermore derive a method of multiple data assimilation (MDA) for this hybrid smoother scheme;

    • the MDA scheme handles the increasing nonlinearity of the temporal interpolation of the posterior over long lag windows or large shifts of the smoothing solution.
  • The result is a two-stage algorithm, estimating the forecast, filtering and re-analyzed smoothing, and shifting the smoother forward in time with two sweeps of the lagged states with an ensemble forecast.

Perfect hidden Markov models

  • For simplicity in this work, we will utilize the perfect model assumption, but generalizing the techniques for the presence of model errors is a subject of ongoing work.
  • Define a perfect physical process model and observation model, \[ \begin{align} \pmb{x}_k &= \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right) & & \pmb{x}_k\in\mathbb{R}^{N_x}\\ \pmb{y}_k &= \mathcal{H}_k \left(\pmb{x}_k\right) + \boldsymbol{\epsilon}_k & & \pmb{y}_k\in\mathbb{R}^{N_y} \end{align} \]
  • Let us denote the sequence of the process model states and observation model states as, \[ \begin{align} \pmb{x}_{m:k} &:= \left\{\pmb{x}_m, \pmb{x}_{m+1}, \cdots, \pmb{x}_k\right\}, \\ \pmb{y}_{m:k} &:= \left\{\pmb{y}_m, \pmb{y}_{m+1}, \cdots, \pmb{x}_k\right\}. \end{align} \]
  • We assume that the sequence of observation error \[ \begin{align} \{\boldsymbol{\epsilon}_k : k=1,\cdots, K\} \end{align} \] is independent in time.
  • In the above configuration, we will say that the transition “density” is given in terms of \[ \begin{align} p(\pmb{x}_k \vert \pmb{x}_{k-1} ) \propto \pmb{\delta} \left\{\pmb{x}_k - \mathcal{M}_k\left(\pmb{x}_{k-1}\right)\right\}\label{eq:dirac_distribution} \end{align} \] where \( \pmb{\delta} \) represents the Dirac distribution.
  • The filtering problem is to sequentially and recursively estimate, \[ \begin{align} p(\pmb{x}_L \vert \pmb{y}_{L:1}) \end{align} \] given:
    • an initial prior \( p(\pmb{x}_0) \); and
    • the observation likelihoods \( p(\pmb{y}_k \vert \pmb{x}_k ) \).
  • This can be described in terms of a Bayesian inference problem.

The observation-analysis-forecast cycle

Diagram of the filter observation-analysis-forecast cycle.

  • Sequentially and recursively estimating the physical state \( \pmb{x}_L \) can be described as an observation-analysis-forecast cycle.
  • Using Bayes' law, this can be written as \[ \begin{align} {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)} } &\propto {\color{#7570b3} { p\left(\pmb{y}_L \vert \pmb{x}_L\right) } } {\color{#1b9e77} { p\left(\pmb{x}_L\vert \pmb{y}_{1:L-1}\right) } } \end{align} \]
  • When the models are linear, i.e., \[ \begin{align} \pmb{x}_k &= \mathbf{M}_k \pmb{x}_{k-1};\\ \pmb{y}_k &= \mathbf{H}_k \pmb{x}_k + \pmb{\epsilon}_k; \end{align} \]
  • and the error densities are Gaussian, i.e., \[ \begin{align} p(\pmb{x}_0) = N\left(\overline{\pmb{x}}_0, \mathbf{B}_0\right) & & p\left(\pmb{y}_k \vert \pmb{x}_k \right) = N\left(\mathbf{H}_k\pmb{x}_k, \mathbf{R}_k\right); \end{align} \]
  • the posterior density is Gaussian at all times, \[ {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)\equiv N\left(\overline{\pmb{x}}^\mathrm{filt}_k, \mathbf{B}_k^\mathrm{filt}\right)}} \] and can be computed analytically in a recursive formulation by the Kalman filter.
  • This describes a recursion on the first two moments of the posterior, parameterizing the distribution.
  • The ETKF

    • With Gaussian error densities,

      \[ \begin{align} p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) = N\left(\overline{\pmb{x}}_L^\mathrm{fore}, \mathbf{B}_L^\mathrm{fore}\right) & & p\left(\pmb{y}_L \vert \pmb{x}_L \right) = N\left(\mathbf{H}_L \pmb{x}^\mathrm{fore}_L, \mathbf{R}_L\right) \end{align} \] this can be written as a least-squares optimization as follows.

    • Suppose that an ensemble \( \mathbf{E}_{L}\in \mathbb{R}^{N_x \times N_e} \) is drawn iid from \( p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) \).

    • Define the ensemble-based, empirical estimates,

      \[ \begin{align} & & \hat{\pmb{x}}_L^\mathrm{fore} &= \mathbf{E}_L^\mathrm{fore} \pmb{1} / N_e ; & \hat{\pmb{\delta}}_L &= \mathbf{R}_k^{-\frac{1}{2}}\left(\pmb{y}_L - \mathbf{H}_L \hat{\pmb{x}}_L\right)\\ &&\mathbf{X}_L^\mathrm{fore} &= \mathbf{E}_L^\mathrm{fore} - \hat{\pmb{x}}^\mathrm{fore}_L \pmb{1}^\top ; & \mathbf{P}^\mathrm{fore}_L &= \mathbf{X}_L^\mathrm{fore} \left(\mathbf{X}_L^\mathrm{fore}\right)^\top / \left(N_e - 1\right);\\ & &\mathbf{S}_L &:=\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \mathbf{X}_L^\mathrm{fore}. \end{align} \]

    • Maximizing the empirical posterior density is thus equivalent to minimizing the cost function

      \[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2} \parallel \hat{\pmb{x}}_L^\mathrm{fore} - \mathbf{X}^\mathrm{fore}_L \pmb{w}- \hat{\pmb{x}}^\mathrm{fore}_L \parallel_{\mathbf{P}^\mathrm{fore}_L}^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L\hat{\pmb{x}}^\mathrm{fore}_L - \mathbf{H}_L \mathbf{X}^\mathrm{fore}_L \pmb{w} \parallel_{\mathbf{R}^{-1}_L}^2 } }\\ \Leftrightarrow& & {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{alignat} \] which is an optimization over a weight vector \( \pmb{w} \) in the ensemble dimension \( N_e \) rather than in the state space dimension \( N_x \).

    The ETKF

    • Setting the gradient \( \nabla_{\pmb{w}} \widetilde{\mathcal{J}} \) equal to zero for \( \overline{\pmb{w}} \) we find

      \[ \begin{align} \pmb{w} = \pmb{0} - {\widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}}^{-1} \nabla_{\pmb{w}} \widetilde{\mathcal{J}}. \end{align} \] where \( \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}:= \nabla^2_{\pmb{w}} \) is the Hessian of the cost function.

      • This corresponds to a single iteration of Newton's descent algorithm, yielding \[ \overline{\pmb{x}}_L^\mathrm{filt} := \overline{\pmb{x}}_{L}^\mathrm{fore} + \mathbf{X}_L^\mathrm{fore} \overline{\pmb{w}} \]
    • If we define a right-transform matrix,

      \[ \begin{align} \mathbf{T}:= \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}} \end{align} \]

    • we similarly have the update for the covariance as

      \[ \begin{align} \mathbf{P}^\mathrm{filt}_L = \left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)\left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)^\top /(N_e - 1). \end{align} \]

    • The numerical cost of the Newton step and the transform \( \mathbf{T} \) are subordinate to the cost of the eigen value decomposition of \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).

    • This sketches the right ensemble transform Kalman filter recursion as in the derivation of the local ensemble transform Kalman filter (LETKF).

    5. Hunt, B.R., Kostelich, E.J., & Szunyogh, I. (2007). Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D: Nonlinear Phenomena, 230(1-2), 112-126.

    Smoothing

    Diagram of the filter observation-analysis-forecast cycle.

    • While we can recursively estimate the density of the current state in this fashion, a related question regards the past states.
    • For example, the observation at time \( t_3 \) can be used to update the conditional estimate for the model state at times \( t_0 \), \( t_1 \) and \( t_2 \).
    • This produces a retrospective posterior estimate for the past states.
    • The smoothing estimate will condition all past states within the data assimilation window (DAW), which is the time series of all states considered for estimation purposes.
    • We condition all past estimates on all observations that correspond to times within the DAW, up to the current time.
    • A smoothing estimate may be produced in a variety of ways, exploiting different formulations of the Bayesian problem as a joint or marginal smoother.
    • The implementation also depends on whether the DAW is fixed in time, or if this window is shifted in time sequentially.

    Sequential Smoothing

    Diagram of the filter observation-analysis-forecast cycle.

    • In the above, we see a schematic of how the DAW is sequentially shifted in time for a general fixed-lag smoother.
    • Generally, we will consider a DAW of a lag length \( L \times \Delta t \), with a sequential shift of length \( S\times \Delta t \).
    • After a smoothing cycle has been completed as described before, we will shift the DAW by \( S \times \Delta t \);
      • this will discard \( S \times \Delta t \) past states from the DAW, and \( S \times \Delta t \) new states and observations will enter the DAW.
    • In the figure above, the \( S \times \Delta t \) new states and observations that enter the DAW are filled in black.
    • The above fixed-lag formalism allows us to treat the smoothing problem like sequential filtering.
    • This can come with large benefits to the precision of estimation by utilizing a full time series to form conditional estimates;
      • however, it can also become extremely costly to produce conditional estimates this way when \( S \) is small and \( L \) is large.

    The ETKS

    • In the ETKF formalism, we can define an ensemble right-transform \( \boldsymbol{\Psi}_k \) such that for any \( t_k \),

      \[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where in the above we would say that \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k-1}) \end{align} \]

    • We will associate \( \mathbf{E}^\mathrm{filt}_k \equiv \mathbf{E}^\mathrm{post}_{k|k} \);

      • under the linear-Gaussian model, we furthermore have that

      \[ \begin{align} \mathbf{E}^\mathrm{post}_{k |L} = \mathbf{E}^\mathrm{post}_{k|L-1}\boldsymbol{\Psi}_{L} & & \mathbf{E}^\mathrm{post}_{k|K} \sim p(\pmb{x}_k \vert \pmb{y}_{1:K}). \end{align} \]

    • Then we can perform a retrospective smoothing analysis on all past states stored in memory by using the latest right-transform update from the filtering step.

    • This form of retrospective analysis is the basis of the ensemble transform Kalman smoother (ETKS)6.

    6. Evensen, G., & Van Leeuwen, P. J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128(6), 1852-1867.

    The ETKS

    • The ETKS takes advantage of the simple form of the retrospective, right-transform analysis by including an additional, inner loop of the filtering cycle.
    Diagram of the filter observation-analysis-forecast cycle.
    • In the above, time is the horizontal axis where right moves forward in time.
    • At each time, we produce the standard filtering estimate by computing \( \boldsymbol{\Psi}_L \) from the cost function, and updating the forecast \[ \mathbf{E}^\mathrm{filt}_L = \mathbf{E}_L^\mathrm{fore} \boldsymbol{\Psi}_L. \]
    • The information of incoming observations is passed backward in time using the right-transform to condition the ensemble at past times: \[ \begin{align} \mathbf{E}^\mathrm{post}_{k|L} = \mathbf{E}^\mathrm{post}_{k|L-1} \boldsymbol{\Psi}_L. \end{align} \]

    The ETKS

    • The information in the posterior estimate thus flows in reverse time to the lagged states stored in memory, but the information flow is unidirectional in this scheme.

    • The improved posterior estimate for the lagged states does not improve the filtering estimate in the perfect, linear-Gaussian model:

      \[ \begin{align} \mathbf{M}_{k} \mathbf{E}_{k-1|L}^\mathrm{post}& = \mathbf{M}_{k} \mathbf{E}_{k-1|k-1}^\mathrm{post} \prod_{j=k}^{L} \boldsymbol{\Psi}_j \\ & =\mathbf{E}_{k|k}^\mathrm{post}\prod_{j=k+1}^L \boldsymbol{\Psi}_j\\ & = \mathbf{E}_{k|L}. \end{align} \]

    • This demonstrates that the effect of the conditioning on the information from the observation is covariant with the dynamics.

    • In fact, in the case of the perfect, linear-Gaussian model, the backward-in-time conditioning is equivalent to the reverse time model forecast \( \mathbf{M}_k^{-1} \), as demonstrated by Rauch, Tung and Striebel (RTS).7

    7. Jazwinski, A. H. (2007). Stochastic processes and filtering theory. Courier Corporation. See example 7.8, chapter 7.

    The 4D cost function

    • The covariance of the conditioning and the model dynamics does not hold either in the case of nonlinear dynamics or model error.

    • Re-initializing the DA cycle in a perfect, nonlinear model with the smoothed conditional ensemble estimate \( \mathbf{E}^\mathrm{post}_{0|L} \) can dramatically improve the performance of the subsequent forecast and filtering statistics.

    • Let us denote the composition of the model forecast as,

      \[ \begin{align} \mathcal{M}_{k:m} : = \mathcal{M}_k \circ \cdots \circ \mathcal{M}_{m} & & \mathbf{M}_{k:m} := \mathbf{M}_{k}\cdots \mathbf{M}_{m} \end{align} \]

    • Particularly, this exploits the miss-match in perfect, nonlinear dynamics between

      \[ \begin{align} \mathcal{M}_{L:1}\left(\mathbf{E}_{0|L}^\mathrm{post}\right):= \mathcal{M}_L \circ \cdots \circ \mathcal{M}_1\left(\mathbf{E}_{0|L}^\mathrm{post}\right) \neq \mathbf{E}_{L}^\mathrm{filt}. \end{align} \]

    The 4D cost function

    • The effectiveness of the linear-Gaussian approximation strongly depends on the length of the forecast window \( \Delta t \);

      • for sufficiently small \( \Delta t \), the densities are well-approximated with Gaussians, yet there are deformations induced due to nonlinearity.
    • If the dynamics for the evolution of the densities are weakly nonlinear, re-initializing the model forecast with the posterior estimate (under the linear-Gaussian cost function) can bring new information into the forecast states in the next DAW.

    • This has been exploited to a great extent by utilizing the 4D-cost function, in which the filtering MAP cost function is extended over multiple observations simultaneously, and in terms of a lagged state directly.

    The 4D cost function

    • Suppose now we want to write \( p(\pmb{x}_{1:L}\vert \pmb{y}_{1:L}) \), the joint smoothing posterior over the current DAW, as a recursive update to the last smoothing posterior.

    • We will suppose that there is an arbitrary shift \( S\geq 1 \).

    • Using a Bayesian analysis like before, we can write

      \[ \begin{align} p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) &\propto \int \mathrm{d}\pmb{x}_0 \underbrace{p(\pmb{x}_0 \vert \pmb{y}_{0:L-1})}_{(1)} \underbrace{\left[\prod_{k=L-S+1}^L p(\pmb{y}_k \vert \pmb{x}_k) \right]}_{(2)} \underbrace{\left[\prod_{k=1}^L \pmb{\delta}\left\{\pmb{x}_k - \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right) \right\} \right]}_{(3)} \end{align} \] where

      1. is the marginal for \( \pmb{x}_0 \) of the last joint smoothing smoothing posterior \( p(\pmb{x}_{0:L-1}\vert\pmb{x}_{0:L-1}) \);
      2. is the joint likelihood of the incoming observations to the current DAW, given the background forecast;
      3. is the free-forecast with the perfect model \( \mathcal{M}_k \).
    • Using the fact that \( p(\pmb{x}_{1:L} \vert \pmb{y}_{1:L} ) \propto p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) \), we can chain together a recursive fixed-lag smoother, sequentially across DAWs.

    The 4D cost function

    • Under the linear-Gaussian assumption, the resulting cost function takes the form

      \[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \hat{\pmb{x}}_0^\mathrm{post} - \mathbf{X}^\mathrm{post}_0 \pmb{w}- \hat{\pmb{x}}^\mathrm{post}_0 \parallel_{\mathbf{P}^\mathrm{post}_{0|L-1}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathbf{H}_k {\color{#1b9e77} { \mathbf{M}_{k:1}} }\hat{\pmb{x}}^\mathrm{fore}_0 - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} \mathbf{X}^\mathrm{fore}_0 \pmb{w} \parallel_{\mathbf{R}^{-1}_L}^2 } } \\ \Leftrightarrow& & \widetilde{\mathcal{J}}(\pmb{w}) &= \frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2 + \sum_{k=L-S+1}^L \frac{1}{2}\parallel \hat{\pmb{\delta}}_k - \mathbf{S}_k \pmb{w}\parallel^2 \end{alignat} \] where the weights \( \pmb{w} \) give the update to the initial state \( \pmb{x}_{0|L-1} \).

    • In the linear-Gaussian case, the solution can again be found by a single iteration of Newton's descent for the cost function above,

      \[ \begin{align} \pmb{w} &= 0 - \mathbf{H}_{\widetilde{\mathcal{J}}}^{-1}\nabla_{\pmb{w}} \widetilde{\mathcal{J}}\\ \mathbf{T} &= \mathbf{H}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}};\\ \mathbf{P}_{0|L}^\mathrm{post} &= \left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)\left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)^\top / \left(N_e - 1\right). \end{align} \]

    The 4D cost function

    • When the state and observation model are nonlinear, using \( \mathcal{H}_k \) and \( \mathcal{M}_{k:1} \) in the cost function, this cost function can be solved iteratively to find local minimum with a recursive Newton step as on the last slide.

    • This approach is at the basis of the iterative ensemble Kalman smoother (IEnKS)8, which produces a more accurate solution than the linear approximation when the models are highly nonlinear;

      • this also allows an easy generalization to optimizing hyper-parameters for the empirical density represented by the ensemble as a Gaussian mixture.
    • Forming a Gaussian mixture model increases the variance of the estimator and reduces the bias of the linear-Gaussian approximation;

      • this has been used variously to correct for sampling error due to nonlinearity9, correct model error in the form of stochasticity and parametric error 10, and to design localization analysis in the spatially extended state11.
    8. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
    9. Bocquet, M. (2011). Ensemble Kalman filtering without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 18(5), 735-750.
    10. Raanes, P. N., Bocquet, M., & Carrassi, A. (2019). Adaptive covariance inflation in the ensemble Kalman filter by Gaussian scale mixtures. Quarterly Journal of the Royal Meteorological Society, 145(718), 53-75.
    11. Lorenc, A. C. (2003). The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(595), 3183-3203.

    The single iteration ensemble transform Kalman smoother (SIETKS)

    • While accuracy can increase with iterations, every iteration of the 4D cost function comes at the cost of the ensemble forecast.

    • In synoptic meteorology, e.g., \( \Delta t \approx 6 \) hours, the linear-Gaussian approximation of the evolution of the densities is actually an adequate approximation;

      • iterating over the nonlinear dynamics may not be justified by the improvement in the forecast statistics.
    • However, the iterative optimization over hyper-parameters in the filtering step of the classical ETKS can be run without the additional cost of model forecasts.

      • This is likewise the case for a nonlinear observation operator in the filtering step.
    • Iterative optimization of the filtering cost function expands in the cost of the eigen value decomposition of \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).

    • Subsequently, the retrospective analysis in the form of the filtering right-transform can be applied to condition the initial ensemble

      \[ \begin{align} \mathbf{E}^\mathrm{post}_{0|L} = \mathbf{E}_{0:L-1}^\mathrm{post} \boldsymbol{\Psi}_L \end{align} \]

    • Like in the IEnKS, we initialize the next DA cycle in terms of the retrospective analysis, and gain the benefit of the improved initial estimate.

      • This leads to a single iteration of the model forecast \( {\color{#1b9e77} {\mathcal{M}_{L:1} } } \), while still optimizing over the nonlinear filtering cost function.

    The single iteration ensemble transform Kalman smoother (SIETKS)

    • Compared to the classical ETKS, this adds an outer loop to the filtering cycle to produce the posterior analysis.
    Diagram of the filter observation-analysis-forecast cycle.
    • The information flows from the filtered state back in time from the retrospective analysis.
    • This re-analyzed state becomes the initialization for the next cycle over the shifted DAW, carrying this information forward.
    • The iterative cost function is only solved in the filtering estimate for the new observations entering the DAW.
      • Combined with the retrospective analysis, this comes without the cost of iterating the model forecast over the DAW.

    The single iteration ensemble transform Kalman smoother (SIETKS)

    • By framing the development of this method in the Bayesian MAP formalism for fixed-lag smoothing we can discuss its novelty.
    • Similar to the IEnKS, this estimates the joint posterior over the DAW via the marginal for \( \pmb{x}_{0|L-1}^\mathrm{post} \).
    • However, instead of solving the 4D cost function which depends on iterations over the model, this sequentially solves the filtering cost function and applies a retrospective analysis.
    • A similar retrospective analysis and re-initialization has been performed in some notable examples:
      • The Running In Place (RIP) smoother12:
        • The RIP is an iterative scheme over the model forecast, used to spin up the LETKF;
        • this uses a lag of \( L=1 \) and multiple iterations of the filtering step to cold-start a filtering cycle.
      • The One Step Ahead (OSA) smoother13
        • The OSA is a single iteration scheme, using a lag of \( L=1 \) and performing a retrospective analysis followed by a second analysis of the re-initialized state.
        • Without model error, the second analysis of the re-initialized state reduces to the free forecast forward-in-time.
    • With a single iteration, a perfect model and a lag of \( L=1 \) all these schemes coincide.
    • Neither of the other schemes handle a lag of \( L>1 \), multiple data assimilation (MDA) schemes, or iterative optimization of the filtering cost function for ensemble hyper-parameters.
    12. Kalnay, E., & Yang, S. C. (2010). Accelerating the spin‐up of ensemble Kalman filtering. Quarterly Journal of the Royal Meteorological Society, 136(651), 1644-1651.
    13. Desbouvries, F., Petetin, Y., & Ait-El-Fquih, B. (2011). Direct, prediction-and smoothing-based Kalman and particle filter algorithms. Signal processing, 91(8), 2064-2077.

    Single layer Lorenz-96 model

    Image of global atmospheric circulation.

    Courtesy of: Kaidor via Wikimedia Commons (CC 3.0)

    • We simulate the performance of the SIETKS versus:
      • the ETKS,
      • the “linear” IEnKS (LIEnKS) using a single iteration, and
      • the fully iterative IEnKS.
    • Our study system is the Lorenz-96 model14.
      • The system is fully observed with \( \mathbf{R}= \mathbf{I} \) and \( \Delta t = 0.05 \).
      • The observation map will be defined as \( \mathcal{H}(\pmb{x}) = \pmb{x} \) when it is linear.
    14. Lorenz, E. N. (1996, September). Predictability: A problem partly solved. In Proc. Seminar on predictability (Vol. 1, No. 1).

    Numerical demonstrations – tuned inflation

    Heat plot.

    Numerical demonstrations – adaptive inflation

    Heat plot.

    Numerical demonstrations – tuned inflation

    line.

    Numerical demonstrations – adaptive inflation

    line.

    Numerical demonstrations – nonlinear observations (SDA)

    Heat plot.

    Numerical demonstrations – nonlinear observations (MDA)

    Heat plot.

    Numerical demonstrations – nonlinear observations (SDA)

    Heat plot.

    Numerical demonstrations – nonlinear observations (MDA)

    Heat plot.

    Numerical demonstrations – nonlinear observations (SDA)

    Heat plot.

    Numerical demonstrations – versus shift size (MDA)

    Heat plot.

    Numerical demonstrations – versus shift size (SDA)

    Heat plot.

    Numerical demonstrations – versus shift size (SDA)

    Heat plot.

    Summary of results in progress

    • The single-iteration ensemble transform Kalman smoother (SIETKS) is an outer loop re-organization of the filtering steps of other smoothing schemes.

    • This uses the lagged, marginal posterior of 4D iterative schemes, but iterates in the filtering step of the classical EnKS.

      • This transform is then applied as a retrospective analysis on the states at the beginning of the DAW.
    • The SIETKS sequentially solves the filtering cost function instead of the 4D cost function.

      • One benefit is that iteratively solving the filtering cost function does not depend on the nonlinear forecast model \( \mathcal{M}_{L:1} \)
      • This provided a more precise forecast with adaptive inflation than the LIEnKS, in the weakly nonlinear regime.
      • Another benefit is that this makes long MDA windows more stable than the LIEnKS.
      • This comes with additional cost over the LIEnKS, but a cost that expands in the number of singular value decompositions of the Hessian \( \mathbf{H}_{\widetilde{\mathcal{J}}}\in \mathbb{R}^{N_e \times N_e} \).
    • A key question is if this can be used to iteratively solve for the localization hyper-parameters that may be used in an operational forecasting system.

      • Localization in the 4D cost function is nontrivial;
      • we are currently studying to see if we can implement an iterative localization optimization in the filtering cost function to take advantage of the single-iteration formalism.
    • Additionally, we are working to extend the method to the case of stochastic and parametric model error.